53 research outputs found
Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse
Normalization operations are essential for state-of-the-art neural networks
and enable us to train a network from scratch with a large learning rate (LR).
We attempt to explain the real effect of Batch Normalization (BN) from the
perspective of variance transmission by investigating the relationship between
BN and Weights Normalization (WN). In this work, we demonstrate that the
problem of the shift of the average gradient will amplify the variance of every
convolutional (conv) layer. We propose Parametric Weights Standardization
(PWS), a fast and robust to mini-batch size module used for conv filters, to
solve the shift of the average gradient. PWS can provide the speed-up of BN.
Besides, it has less computation and does not change the output of a conv
layer. PWS enables the network to converge fast without normalizing the
outputs. This result enhances the persuasiveness of the shift of the average
gradient and explains why BN works from the perspective of variance
transmission. The code and appendix will be made available on
https://github.com/lyxzzz/PWSConv.Comment: This paper has been accepted by AAAI2
Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge
In formal procedure of civil cases, the textual materials provided by
different parties describe the development process of the cases. It is a
difficult but necessary task to extract the key information for the cases from
these textual materials and to clarify the dispute focus of related parties.
Currently, officers read the materials manually and use methods, such as
keyword searching and regular matching, to get the target information. These
approaches are time-consuming and heavily depending on prior knowledge and
carefulness of the officers. To assist the officers to enhance working
efficiency and accuracy, we propose an approach to detect disputes from divorce
cases based on a two-round-labeling event extracting technique in this paper.
We implement the Judicial Intelligent Assistant (JIA) system according to the
proposed approach to 1) automatically extract focus events from divorce case
materials, 2) align events by identifying co-reference among them, and 3)
detect conflicts among events brought by the plaintiff and the defendant. With
the JIA system, it is convenient for judges to determine the disputed issues.
Experimental results demonstrate that the proposed approach and system can
obtain the focus of cases and detect conflicts more effectively and efficiently
comparing with existing method.Comment: 20 page
Impact of a Diagnostic Pressure Equation Constraint on Tornadic Supercell Thunderstorm Forecasts Initialized Using 3DVAR Radar Data Assimilation
A diagnostic pressure equation constraint has been incorporated into a storm-scale three-dimensional variational (3DVAR) data assimilation system. This diagnostic pressure equation constraint (DPEC) is aimed to improve dynamic consistency among different model variables so as to produce better data assimilation results and improve the subsequent forecasts. Ge et al. (2012) described the development of DPEC and testing of it with idealized experiments. DPEC was also applied to a real supercell case, but only radial velocity was assimilated. In this paper, DPEC is further applied to two real tornadic supercell thunderstorm cases, where both radial velocity and radar reflectivity data are assimilated. The impact of DPEC on radar data assimilation is examined mainly based on the storm forecasts. It is found that the experiments using DPEC generally predict higher low-level vertical vorticity than the experiments not using DPEC near the time of observed tornadoes. Therefore, it is concluded that the use of DPEC improves the forecast of mesocyclone rotation within supercell thunderstorms. The experiments using different weighting coefficients generate similar results. This suggests that DPEC is not very sensitive to the weighting coefficients
Security and Energy-aware Collaborative Task Offloading in D2D communication
Device-to-device (D2D) communication technique is used to establish direct links among mobile devices (MDs) to reduce communication delay and increase network capacity over the underlying wireless networks. Existing D2D schemes for task offloading focus on system throughput, energy consumption, and delay without considering data security. This paper proposes a Security and Energy-aware Collaborative Task Offloading for D2D communication (Sec2D). Specifically, we first build a novel security model, in terms of the number of CPU cores, CPU frequency, and data size, for measuring the security workload on heterogeneous MDs. Then, we formulate the collaborative task offloading problem that minimizes the time-average delay and energy consumption of MDs while ensuring data security. In order to meet this goal, the Lyapunov optimization framework is applied to implement online decision-making. Two solutions, greedy approach and optimal approach, with different time complexities, are proposed to deal with the generated mixed-integer linear programming (MILP) problem. The theoretical proofs demonstrate that Sec2D follows a [O(1∕V),O(V)] energy-delay tradeoff. Simulation results show that Sec2D can guarantee both data security and system stability in the collaborative D2D communication environment
Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases
Large Language Models (LLMs) have demonstrated remarkable performance in code
completion. However, due to the lack of domain-specific knowledge, they may not
be optimal in completing code that requires intensive domain knowledge for
example completing the library names. Although there are several works that
have confirmed the effectiveness of fine-tuning techniques to adapt language
models for code completion in specific domains. They are limited by the need
for constant fine-tuning of the model when the project is in constant
iteration.
To address this limitation, in this paper, we propose NM-LM, a
retrieval-augmented language model (R-LM), that integrates domain knowledge
into language models without fine-tuning. Different from previous techniques,
our approach is able to automatically adapt to different language models and
domains. Specifically, it utilizes the in-domain code to build the
retrieval-based database decoupled from LM, and then combines it with LM
through Bayesian inference to complete the code. The extensive experiments on
the completion of intra-project and intra-scenario have confirmed that NM-LM
brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A
deep analysis of our tool including the responding speed, storage usage,
specific type code completion, and API invocation completion has confirmed that
NM-LM provides satisfactory performance, which renders it highly appropriate
for domain adaptive code completion. Furthermore, our approach operates without
the requirement for direct access to the language model's parameters. As a
result, it can seamlessly integrate with black-box code completion models,
making it easy to integrate our approach as a plugin to further enhance the
performance of these models.Comment: Accepted by ASE202
Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning
Self-supervised learning enables networks to learn discriminative features
from massive data itself. Most state-of-the-art methods maximize the similarity
between two augmentations of one image based on contrastive learning. By
utilizing the consistency of two augmentations, the burden of manual
annotations can be freed. Contrastive learning exploits instance-level
information to learn robust features. However, the learned information is
probably confined to different views of the same instance. In this paper, we
attempt to leverage the similarity between two distinct images to boost
representation in self-supervised learning. In contrast to instance-level
information, the similarity between two distinct images may provide more useful
information. Besides, we analyze the relation between similarity loss and
feature-level cross-entropy loss. These two losses are essential for most deep
learning methods. However, the relation between these two losses is not clear.
Similarity loss helps obtain instance-level representation, while feature-level
cross-entropy loss helps mine the similarity between two distinct images. We
provide theoretical analyses and experiments to show that a suitable
combination of these two losses can get state-of-the-art results. Code is
available at https://github.com/guijiejie/ICCL.Comment: This paper is accepted by IEEE Transactions on Image Processin
LawBench: Benchmarking Legal Knowledge of Large Language Models
Large language models (LLMs) have demonstrated strong capabilities in various
aspects. However, when applying them to the highly specialized, safe-critical
legal domain, it is unclear how much legal knowledge they possess and whether
they can reliably perform legal-related tasks. To address this gap, we propose
a comprehensive evaluation benchmark LawBench. LawBench has been meticulously
crafted to have precise assessment of the LLMs' legal capabilities from three
cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize
needed legal concepts, articles and facts; (2) Legal knowledge understanding:
whether LLMs can comprehend entities, events and relationships within legal
text; (3) Legal knowledge applying: whether LLMs can properly utilize their
legal knowledge and make necessary reasoning steps to solve realistic legal
tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label
classification (SLC), multi-label classification (MLC), regression, extraction
and generation. We perform extensive evaluations of 51 LLMs on LawBench,
including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific
LLMs. The results show that GPT-4 remains the best-performing LLM in the legal
domain, surpassing the others by a significant margin. While fine-tuning LLMs
on legal specific text brings certain improvements, we are still a long way
from obtaining usable and reliable LLMs in legal tasks. All data, model
predictions and evaluation code are released in
https://github.com/open-compass/LawBench/. We hope this benchmark provides
in-depth understanding of the LLMs' domain-specified capabilities and speed up
the development of LLMs in the legal domain
- …